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Abstract. Bayesian analysis of LISA data sets based on Markov chain Monte 
Carlo methods has been shown to be a challenging problem, in part due to the 
complicated structure of the likelihood function consisting of several isolated local 
maxima that dramatically reduces the efficiency of the sampling techniques. Here 
we introduce a new fully Markovian algorithm, a Delayed Rejection Metropolis- 
Hastings Markov chain Monte Carlo method, to efficiently explore these kind 
of structures and we demonstrate its performance on selected LISA data sets 
containing a known number of stellar-mass binary signals embedded in Gaussian 
stationary noise. 



PACS numbers: 04.80.Nn, 02.70.Uu, 07.05.Kf 
1. Introduction 

Surveys of our galaxy in the gravitational-wave band with the Laser Interferometer 
Space Antenna, LISA [Tj will yield the largest sample of close stellar-mass binary 
systems, in particular white dwarfs. LISA is in fact expected to individually resolve 
~ 10 4 stellar-mass binary systems that are sufficiently bright to stand above the 
confusion noise produced by the whole population of binaries, most of which are 
unresolved [2j[3]. Gravitational waves from stellar-mass compact objects in the low- 
frequency observational window arc long lived, and in order to extract the maximum 
amount of information, one needs to be able to disentangle and accurately measure the 
parameters of a large number of sources overlapping in time and frequency space. This 
represents a challenging data analysis problem, whose successful solution is essential 
for the full science exploitation of the LISA data set. 

Due to the large number of sources to be analysed at the same time and the 
fact that this number is unknown, Markov chain Monte Carlo (MCMC) techniques 
HI El O [9] and their extension to Reversible Jump Markov chain Monte Carlo 
(RJMCMCs) [lOlIITjrrj] are expected to be one of the most powerful search methods, 
although analysis approaches based on matched-filtering with a template bank have 
also been explored [13l [14] . 
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In MCMC applications to LISA data analysis, one of the key problems that one 
needs to tackle is the fact that the joint posterior probability density function (PDF) 
of the source parameters - the target distribution - presents a multimodal structure, 
often with secondary maxima well separated from the mode, see Figure [1] for an 
example. In such situations, maintaining a high acceptance ratio while sampling the 
full structure of the posterior PDF becomes very difficult, and may lead to a negligible 
efficiency of the algorithm. Several solutions have been proposed to increase the 
mixing and exploration ability of the chains, such as using parallel chains exploring 
the parameter space that, at a certain point, can be swapped (e.g. Metropolis- 
coupled MCMC algorithms [H]), or other methods like simulated annealing [16] or 
simulated tempering [17l[T8] that consist in adding a 'temperature' factor to the target 
distribution in order to make it smoother at the beginning and therefore to promote 
the movement of the chain. Examples of the use of some of these techniques to 
compute the posterior PDF for white dwarf binaries in LISA data sets are given in e.g. 
Refs. [U 03 [H [TTJ rT2] . However, as the number of dimensions increases and the target 
distribution becomes more complicated, it is not clear which technique works best. For 
instance, running N parallel chains requires N times more computing time; moreover, 
it has been shown that in some cases, this approach may yield low efficiency |19j . 
Non-Markovian explorations during an initial "search phase" have been considered to 
quickly identify the neighbourhood of the main mode of the distribution; however these 
strategies suffer from the significant draw-back that they cannot be readily extended 
to the relevant case of an unknown number of dimensions (i.e. unknown number of 
signals) and as a consequence model selection. For all these reasons, it is useful to 
explore alternative approaches. 

In this paper we explore a different strategy for a fully Markovian sampling of 
multimodal posterior PDFs: a Delayed Rejection (DR) Metropolis-Hastings Markov 
chain Monte Carlo method. Delayed Rejection was originally introduced by Tierney 
and Mira [20] [21] for fixed dimension problems and then extended by Green and 
Mira 22J to transdimensional problems. The algorithm is now found in a wide variety 
of science applications, see e.g. Refs. [23] [2H [55] [2E1 [57] [25]- In general, in all 
applications so far, either the number of steps was very small (2 or 3) or the assumption 
of symmetrical proposals is made. We have recently extended the DR algorithm to its 
most general scheme, using an arbitrary number of stages |29| . Here we demonstrate 
its power in applications to the LISA white dwarf problem, by applying it to the 
analysis of data sets in the restricted case in which the dimensionality of the problem 
(that is proportional to the number of white dwarf presents in the data set) is fixed 
and known. 

We would like to emphasise that although the waveforms that we consider here 
are the simplest present in the LISA data set, this specific example tackles all the 
conceptual issues that will be present in other LISA data analysis problems involving 
more complex signals, such as those from spinning massive black hole binaries and 
extreme-mass ratio inspirals. The DR scheme will therefore be applicable to the 
development of MCMC analysis algorithms targeted at the study of those sources. 
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2. Delayed Rejection Markov chain Monte Carlo methods 

2.1. The general method 

The Metropolis-Hastings algorithm is widely used in applications of Markov chain 
Monte Carlo methods to sample a target distribution x(x), in our case the posterior 
PDF of a set of parameters. Given a state of the Markov chain A, one proposes a 
new state A' drawn from a proposal distribution (or transition kernel) q(X, A'), the 
probability of A' given A. This new state is accepted with probability 

\(X')q(X',X)\ 



a(X, A') = 1 A 



(1) 



7r(A)?(A,A') J 

and the chain remains at A with probability 1 — a(A, A'). In the previous equation, 
the notation a A b (for any real numbers a and b) stands for the minimum between a 
and b, and as a consequence the acceptance probability is limited to unity in the case 
where the ratio is > 1. 

An effective way to improve the efficiency of the MCMC algorithm has been 
proposed in the form of Delayed Rejection [20] [2TJ [22] , and now we briefly summarise 
it. Suppose that the current state of the chain is A. A candidate, Pi, is generated 
from a proposal distribution gi (A, /3i) and accepted with probability 

\(Pl)qi0l,X)\ 



ai(A,/3i) = 1 A 



(2) 



L 7r(A)ft(A,A) J 

as in a standard Metropolis-Hastings algorithm, see Equation If j3\ is not accepted, 
instead of rejecting it, one can use this information and propose a new state p 2 from a 
(in principle different) proposal distribution (72 (A, Pi, P 2 ) which may use information 
about the rejected state Pi. 

Moreover, in general the i— th stage of the DR chain is as follows [29]: if the state 
Pi-l is proposed and rejected, one can propose a new candidate Pi from the proposal 
Qi(X, fii, . . . , Pi) and accept it with probability 

\0i) qi0uPi-i) qiiPiJi-uPi-z) ... qi(PiJi. 



ai(X,Pi 



,A) = 1A 



,A) 



Tr(X)q 1 (X,p 1 )q 2 (X,p ll p 2 ) ... ?i (A, ft,..., A) 



1 - ai(Pi,Pi-i) 


1 - a 2 (Pi, Pi-i, Pi-2) 




1 - CCi-l(Pi, ...pi) 




l-ai(X,&) 


l-a 2 (X,p 1 ,p 2 ) 




1 - aj_i(A, . .-Pi-i) 





.(3) 



Notice that with the notation we are using, qi(X, Pi, . . . , pi-i, Pi) represents the 
proposal probability of Pi given {A,/3i, . . . ,/3i_i} where the order of the parameters 
does matter. By imposing detailed balance at each stage and deriving the acceptance 
probability that preserves it, the resulting chain will be a reversible Markov chain with 
invariant distribution ir. 

From a theoretical point of view, it can be demonstrated in general (see Sec. 2.2 of 
Ref. [29] ) that the variance of an estimate made from a chain using Delayed Rejection is 
always smaller than that produced with a standard Markov chain. Moreover, besides 
increasing the efficiency of the sampling, this method allows us to propose a new 
candidate using the information of the past elements of the chain. We will profit from 
these two properties in increasing the efficiency of the computation of the marginalised 
posterior distributions generated by integrating multimodal likelihood functions. 
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2.2. Key rules on the choice of proposals 

An essential point of the DR scheme is that Eq. §5§ has been derived by imposing 
that the backward path, from /3, to A, follows the forward path, from A to 
in reverse order. As a consequence, this property forces one to choose proposal 
distributions that preserve, in some way, the reversibility of the chain in order to avoid 
negligible acceptance probabilities. Full details about the general implementation of 
the algorithm are provided in Rcf. [29]; here we focus on the choice of proposals 
relevant to searches for white dwarf binary systems. 

When we run a Metropolis-Hastings MCMC chain - likely exploring a secondary 
mode of the distribution - we can start a DR chain with a large jump that attempts to 
reach the mode of the distribution; such transition, likely to be rejected but hopefully 
closer to the mode, is followed by many small ones in order to explore the new region 
of the parameter space. In terms of the proposal PDFs that appear in Eq. , this 
translates into having gi(A, = q a (X, f3\) mainly proposing large transitions in the 
parameter space, and qj(X, j3i, . . . , (3j) = qb{x[X, . . . , Pj-i], Pj) for j > 2, proposing 
small transitions. This choice inevitably adds some degree of non-reversibility that 
significantly degrade the acceptance probabilities, see Eq. ([3]). One could however 
achieve an efficient DR algorithm, by using proposals that allow for both small and big 
jumps, such as a mixture of three Gaussians (3-Gaussian, from now on) symmetrical 
functions: 

q a ,b{x,x) = N a ,b G{x,ay)x)+- — [G(x - /i, o~2 ; x) + G(x + fi,a 2 ;x)} ; (4) 

in the previous equation, x may be a function of the old elements of the DR chain, 
N a ,b represents the probability of proposing a value from the central Gaussian and 

G(fj,,a;x) = exp — ^ x 2 Jz • Notice that an efficient exploration of multimodal 
structures could in principle be achieved by taking N a — > and AT b — >• 1, but this 
in general would lead to negligible acceptance probabilities. In addition, the central 
value of the 3-Gaussian, x[X, . . . , Pj-i] that is computed at each DR stage making use 
of the information from the old elements of the chain, should be independent of the 
particular sample of the chain. In our application we use the mean of all the past 
elements of the chain after the big jump, 

x[X,[3i, . . = mean(/3i, . . . . (5) 

With these choices, all the proposals used during the DR are characterised by 
6 parameters: {eri, 02, Mi N a , Nb} and the maximum number of elements in a DR 
chain, noR! as a consequence, the contribution of the ratio of proposals to the final 
acceptance probability, Eq. ([3|), can be quantified (see Figs. 4 and 6 of [29]) in terms 
of these parameters. The choice of these parameters is the essential step in obtaining 
a DR algorithm that probes in an efficient way the posterior PDF. In practice, they 
can be chosen as follows: (i) {eri, 02, m}i are chosen to have a proposal adapted to the 
multimodal structure of the target distribution, [i being the typical distance between 
two maxima of the target distribution and o~i their typical width; as consequence a 
qualitative understanding of the typical scales in parameter space of the likelihood 
function is important (we deal in detail with this point in the next Section) ; (ii) from 
{<Ti, 02, /i}, one can then identify the panels from Figs. 4 and 6 of [53] corresponding to 
the closest ai/fj, — er 2 //U values; first, one looks at the losses due to the central proposal 
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evolution (CPE) (Figure 6 of [22]) to select Nf, high cnougrjj] to have acceptable 
expected values of this ratio of proposals; as one normally works with nrjR > 100, 
the results depend very weakly on nrjR; (iii) one then needs to consider the losses 
due to an asymmetrical proposal (AP) (Figure 4 of |29j ) in order to choose N a ; (iv) 
finally, as again the dependency on «dr is weak, its choice will be mainly driven by 
the constraints on the computational costs. 

So, by following the prescriptions described here and having some previous 
knowledge about the multimodal structure of the target distributions that has to be 
sampled, one can implement a DR MCMC algorithm with an arbitrary numer of steps, 
nrjR, capable of efficiently sampling these multimodal distributions. The algorithm is 
completely general, and can be applied to a wide range of problems. In the following 
sections we will focus on the application of the DR MCMC algorithm to the search of 
a known number of stellar-mass binary signals in Gaussian and stationary noise with 
LISA. 



3. Application to LISA observations of white dwarfs 

We consider a LISA data set - that we indicate with dA,E, to highlight the use of 
the two noise-orthogonal Timc-Dclay-Interfcrometry observables A and E, see e.g. 
Ref. |30j for a review - containing a known number, N, of signals from stellar-mass 
binary systems {Ha,e) buried in Gaussian stationary noise (tia,e)'- 

dA,E{t) = n A ,E(t; Cn) + h A ,E(t; Cs) ■ (6) 

The vector C = {Cn,Cs} contains all the M unknown parameters that describe the 
problem, where Cn and Cs are the noise and signal parameters, respectively. For the 
specific case considered here, we focus on monochromatic (in the source reference 
frame) white dwarfs and purely Gaussian and stationary (with zero mean and variance 
er^) instrumental noise. The total number of parameters that describe the problem - 
i. e. the dimensionality of C — is M = 1 + 7N 

[ A x A x 

Cs = Ufo,A+,-^,(j),B,'ip,tpo}i,{fo,A + ,-^-,(t),9,'tp,ipo}2,. 

tn = ° 2 n, (7) 

where /o is the constant frequency of the signal (with respect to the Solar System 
Barycentre), A+ and A* are the amplitudes of the two GW polarizations, the 
longitude </> and co-latitude 9 are the two angles that define the source's sky location, 
tjj is the polarization angle of the gravitational wave and tpo is a constant that fixes 
the initial phase of the signal. 

We make use of the Delayed Rejection MCMC algorithm introduced in the 
previous Section (see for more details) to compute the (marginalised) posterior 
PDFs of the noise and signal parameters. By applying the Bayes' theorem, we can 
compute the joint posterior density function, p(C|cZ), of C given the data sets, d, 

I The tolerance range depends on the typical ratio between the likelihood of the main maximum and 
the likelihood of the secondary ones. In our experience, allowing losses of the order of e~ 3 — e -4 is 
acceptable. 
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where C(d\£) is the likelihood function. the prior probability density function, 
and p(d) is the marginal likelihood. From a Markov chain Monte Carlo algorithm 
perspective, the target distribution that needs to be sampled is 7r(£) = p(£) C(d\^) oc 
p(C\d). Since the resulting chain is a Markov chain with invariant distribution ir, the 
average along a sample path of a function / is an asymptotically unbiased estimate of 
J f(x)n(x)dx; in particular, the marginalised posterior PDF over a set of parameters, 
can be directly extracted from the output Markov chain of the algorithm. 

3.1. Structure of a single source target distribution 

The application of the DR algorithm requires some previous knowledge of the 
multimodal structure present in the target distribution of the problem at hand. 
This structure is intrinsic to the single source posterior PDF and, in particular, to 
its likelihood function, C(d\Q. Local maxima [3] occur in the likelihood surface 
at approximately multiples of the modulation frequency (/ m = lyr -1 ) around the 
frequency of the signal measured at the Solar System Barycentre. Although the main 
multimodal structure occurs in the frequency parameter, other parameters, like the 
sky position, are correlated to the frequency, and this is reflected in the structure of 
secondary maxima of the likelihood function. 

In Fig. [T] we explore the structure of the likelihood function in the 3-dimensional 
space {/, 9, <fi}, around the global maximum and the secondary mode, for three 
different frequency values. It can be observed how local modes of the maximised 
likelihood function appear around the actual frequency value at distances that are 
multiples of /.,„ and, at the same time, the sky location is affected. As the signal 
frequency increases, a smaller change in the sky location can produce the same 
frequency shift due to the Dopplcr effect produced by the LISA motion, which explains 
the finer structures in the panels at the bottom of Fig. [TJ In order to have a 
more systematic and quantitative understanding of the multimodal structure of the 
likelihood function, we have performed 500 Monte Carlo (MC) simulations around 
different frequencies, taking random values for all the 7 parameters that characterise a 
single white dwarf binary signal, and have computed the relative position (in parameter 
space) of the secondary maxima with respect to the main mode. The results of this 
Monte Carlo's are summarised in Fig. [21 from those we can make a suitable choice of 
the parameters {<Ji,a2,n} that determine the proposals of the DR stage. 

Some general conclusions, that are important for the tuning of the DR algorithm 
can be drawn. The frequency separation between the two modes appears to 
be independent of the actual frequency of the signal and the values are mostly 
concentrated between 0.75/ m and 1.75/ m . Moreover, we can estimate the angular 
distance on the two-sphere, AO, between the sky position corresponding to the main 
and the secondary maximum. From the central plot of Fig. [2] we observe that AO 
progressively shifts to lower values as the signal's frequency increases, and that (right- 
hand plot of Fig. [2]) the mean value of AO changes with the co-latitude, 6, producing 
secondary maxima that are more separated near the ecliptic plane than at the poles. 
A qualitative understanding of the dependency observed in Figure [2] can be simply 
derived from the expression of the Doppler shift, (j)D,i{t) = 27r/o/c k ■ Vi(t), where k is 
the unit vector from the source to the detector and Vi is the velocity vector of the i-th 
LISA spacecraft. From here, we observe that the required changes in the sky location 
needed to obtain a certain frequency shift, will be inversely proportional to /q. 
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Figure 1. Colour maps with the marginalised likelihood function in the co- 
latitude (8) — longitude (<j>) plane at two particular frequency values corresponding 
to the main (left-hand plots) and secondary maxima (right-hand plots); and, linear 
plots in the top panels with the marginalised (lower solid line) and maximised 
(higher solid line) likelihood as a function of the difference between the signal's 
frequency, /, and its actual value reported in the key file, fkey, measured in units 
of the LISA'S modulation frequency, / m = lyr - 1 . These plots were repeated for 
three frequencies. The dashed lines in the colour maps represent the actual sky 
location of the source, meanwhile the dashed line in the top panels represents the 
frequency value where the colour plots were generated. 
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Figure 2. Results obtained after performing 500 MCs over all the 7 parameters 
that characterise the GW signal measured from a stellar-mass binary system 
and studying the relative position of the secondary maxima with respect to the 
main one. The frequency values are restricted to narrow bands around lmHz 
(dash-dot), 3mHz (solid) and lOmHz (dotted), (a) The left-hand plot shows the 
distributions of relative distance in frequency between the two maxima and in 
units of the LISA'S modulation frequency, f m = lj/r -1 ; (b) the central panel 
plots the distributions of the spherical angular distance ; (c) the right-hand plot 
shows the dependency of these averaged values of angular distance with the co- 
latitude of the source. 



3. 2. Parameters for the Delayed Rejection proposal distributions 

The multimodal structure of the target distribution relevant for a search for stellar- 
mass binaries in the LISA data is mainly present in the frequency of the signal, 
although there are correlations with the sky location angles. The solution that we 
have adopted here, as a first implementation of the DR algorithm, consists in applying 
a scheme where only these three parameters, {/, #,</>}, are updated: the first one, /, 
using a mixture of three Gaussian distributions and the two angles defining the sky 
location drawn from a single Gaussian proposal centered at the previous element of the 
DR chain and width according with the results obtained in Fig. [5] With this, only the 
frequency proposals can contribute with small factors to the acceptance probability of 
Eq. ([3]); so we will now focus on these proposals. 

By following the prescription given in Sec. 12.21 we select the first three parameters, 
{(71,02, m}j that characterise the 3-Gaussian distribution for the frequency. In light 
of the results shown in Figs. [1] and [21 our proposals can have the modes separated by 
[i = 1.25/ TO , choosing a o~\ = f m /2 wide central Gaussian when exploring local modes 
and CT2 = 0.2/ m for the external Gaussians. This means that we propose from the 
external Gaussians, 95.4% of the values in the (0.85/ m , 1.65/ m ) interval. 

Moreover, by looking at the corresponding panels of Figures 6 and 4 of [29) (in 
our case, o~i/fx = 0.4 and o^/a* = 0.16), one can read out the values for Nb and N a . 
Finally, the maximum number of iterations allowed in a Delayed Rejection is fixed 
mainly for computational reasons; from our experience we think that riDR = 1000 to 
3000 are adequate values, depending on whether one prefers to attempt more frequent 
short DR chains, or longer chains less frequently. In summary, a DR algorithm for 
efficiently explore the joint posterior PDF in the frequency space can be defined by 
the following parameters: 

{ai,<T 2) jU, N a , N b , n BR } = {0.5/ m , 0.2/ m , 1.25/ m , 0.15, 0.95, 1000} (9) 
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Figure 3. Output chains obtained after running the DR MCMC algorithm 
presented in this article over a LISA data set containing 3 sources around l.lmHz 
buried in Gaussian stationary noise, two of them separated in frequency by 2f m , 
fm being the frequency modulation of LISA, and the third one located at A0f m 
from those two. All the other parameters are chosen randomly. The dashed lines 
represent the actual values of the parameters for each source and we use solid 
lines of different colours to represent the parameters from different chains. 
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Figure 4. The same plots as in Fig.|3]but here working with a 5 sources data set 
around 1.6mHz, with a separation of 13/ m between each source. Only the third 
chain (blue in the on-line version) converged to a wrong sky location (the opposite 
to the actual one) because of degeneracies present in the likelihood surface and 
its proximity to one of the poles. 



Delayed Rejection Markov chain Monte Carlo approach for LISA analysis 



10 



3.3. Results 

We have implemented a DR MCMC algorithm to search for a known number of stellar- 
mass binary signals in LISA data, assuming Gaussian and stationary noise. In this 
section we provide example results obtained by applying the analysis method on two 
different 1-yr long data sets: (i) one containing three sources at 1.1 mHz separated in 
frequency by 2/ m and 40/ m , see Fig. El and (ii) the other including 5 sources around 
1.6 mHz equally separated in frequency by 13/ m , see Fig. 21 From these figures we 
can see that after the burn-in period - we note that the DR part is switched on only 
after burn-in - that has been discarded, most of the chains are sampling either the 
actual mode of the signals or secondary maxima. At this point, the Delayed Rejection 
is switched on (with probability 1/3000) and from time to time a new region w l/ m 
away from the current position of the chain, is explored searching for higher maxima. 
We observe that once the chain finds the right mode in the frequency space, all the 
other parameters also move their actual values. 

With the DR, we are drawing «dr = 1000 proposals mainly targeted to a different 
mode of the posterior PDF with a probability of 1/3000, most of which are rejected. 
By keeping the same odds with a standard MCMC, we would have the same ability 
of exploring different maxima but this algorithm would yield an extremely correlated 
Markov chain, i. e. greater variances of the estimates once it has reached the stationary 
distribution. On the other hand, one could use a standard MCMC proposing big jumps 
in the parameter space with probability 1/3000, and the resulting Markov chain would 
have the same autocorrelation as the one obtained from a DR MCMC; however, the 
ability to explore different maxima of the target distribution would decrease by a factor 
n-DR = 1000. We refer the reader to Ref. [55] for further discussions and examples. 

The results of Figs.[3]and[4]were obtained from a single Markov chain of 5 x 10 5 x N 
and 15 x 10 5 x N elements, respectively; where the first 25000 x N elements were 
discarded as a part of the burn-in period. The length of the chains is sufficient to 
demonstrate the ability of the DR MCMC algorithm to find multiple sources, but in 
a real problem, we would run longer chains in order to get accurate samplings of the 
posterior PDFs. 

In addition of the DR parameters, see Sec. 13.21 many others for the MCMC 
algorithms need to be chosen carefully. Here we summarise the most relevant for the 
present discussion. First of all, we are not always updating the same number of sources. 
We set probabilities of updating 1,2, ... ,N sources, always selecting the closest ones 
in frequency, since this is the main parameter that determines the degree of correlation 
between two sources: two sources with exactly the same parameters but separated by, 
say 1000/ m , are perfectly resolvable. Our proposal density functions during the normal 
MCMC evolution are single Gaussian distributions centered at the actual element of 
the chain, with constant values for the amplitude of the proposals, A», throughout 
the evolution of the chain; for the angles, the value of Aj was set to one-ninth of 
the prior range (which covered the entire physical space). For the other parameters 
we use A f = lf m = 3.17nHz, A Ay/A+ = 2/3, A log ^ + = 0.08 and A log(T = 0.01. 
We also consider updates of all the parameters at a time, but always working with 
diagonal variance-covariance matrices for the proposals. When the number of sources 
is greater than 1, the DR stage is attempted with all the selected sources at a time. 
We account for symmetries and degeneracies of the target distribution by proposing, 
with a given probability, special jumps based on our prior knowledge. Most of these 
'quasi-equivalent' points in the parameter space are actual degeneracies of the antenna 
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patterns that characterise the LISA response in the low frequency range [3T] , and that 
at higher frequencies appear as secondary maxima. When searching for more than one 
source, we also consider big jumps in the parameter space: single Gaussian proposals 
of width equal to one-third of the whole prior range for all the parameters. Wc found 
them necessary (see e.g. Fig. [4| because converged chains act like "walls" in the 
frequency space, decreasing the chances of other chains to cross over them. 

4. Conclusions and future work 

The motion of the LISA instrument during the observations of long lived GWs 
modulates the phase and amplitude of the detected signal, producing likelihood 
functions with a complicated structure consisting of several isolated local maxima. 
This is a common feature of all long lived LISA sources and, in turn, makes standard 
MCMC samplers inefficient. Here, wc have presented a new strategy for the fully 
Markovian sampling of multimodal posterior PDFs based on a Delayed Rejection 
MCMC method and we have demonstrated its performance by applying it on the 
search for multiple stellar binary systems in Gaussian and stationary noise. 

The Delayed Rejection technique introduced here, can be fruitfully applied 
in many other LISA data analysis areas involving more complex signals, such as 
those from spinning massive black hole binaries and extreme-mass ratio inspirals; 
furthermore, it needs to be extended to the case of trans-dimensional problems, 
relevant to the analysis of LISA data sets to search for the whole galactic populations of 
stellar-mass binary systems, where the number of GW sources in unknown. In fact, an 
additional benefit of having a fully Markovian algorithm capable of efficiently exploring 
the target distribution is the possibility of estimating the Bayes factor of different 
models directly from the output chains. We are currently exploring the possibility of 
combining the Delayed Rejection technique with a Reversible Jump MCMC algorithm 
in order to search for an unknown number of sources. 
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